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An Inveetigation of the Standard Errors of Expected A Posteriori 

Ability Estimates 



ABSTRACT 

Expected a posteriori (EAP) has a number of advantages over maximum likelihood estimation 
(MLE) or maximum a posteriori (MAP) estimation methods. These include ability estimates 
(§s) for all response patterns including zero and perfect score patterns, less regression 
towards the mean than MAP ability estimates, and an average squared error that is less than 
that for MAP and MLE §s. Bock and Mislevy (1982) state that the posterior standard deviation 
(PSD(S)) is virtually interchangeable with the standard error (SEE), A typical criterion for 
terminating an adaptive test is when the d’s SEE is equal to or less than a predetermined 
value. However, if there are conditions in which the PSD(§) is not interchangeable with the 
SEE, then the adaptive test may not be validly terminated. Moreover, in applications where an 
examinee must be classified on the basis of his/her ability estimate (e,g., as a master versus 
nonmaster) one typically creates a confidence interval about the examinee’s ability estimate 
using the d’s SEE, The use of the PSD(6) in these situations may lead to incorrect 
classifications if the PSD(fi) does not agree with the SEE. Results of this Monte Carlo study 
showed that the use of 10 quadrature points tends to result in PSD(6)s which underestimate 
the observed standard error. The use of 80 quadrature points, given the test’s length 
(possibly 2 test length quadrature points under certain conditions), is recommended where 
accurate PSD(§)s are required. 
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Item response theory (IRT) has emerged as a popular approach for solving various 
measurement problems, such as test design, test equating, and computerized adaptive testing 
(CAT), and IRT techniques are becoming more common in practical testing situations. For 
example, certification boards such as the American Society of Clinical Pathologists have 
established an IRT-based CAT system for certification (Bergstrom & Lunz, 1991). Unlike the 
conventional paper-and-pencil test in which an examinee, regardless of ability, is administered 
all test items, CAT is a procedure for administering tests that are individually tailored for each 
examinee. Advantages of IRT*based CAT over paper-and-pencil testing have been well 
documented (e.g., Wainer, 1990; Weiss, 1982). Although not necessary, a CAT system typically 
uses an IRT model in combination with item characteristics to estimate the examinee’s ability. 

Ability estimation in CAT has typically used one of three methods: maximum likelihood 
estimation (MLE) or Bayesian approaches such as maximum a posteriori (MAP, also known as 
Bayes Modal Estimate) and expected a posteriori (EAP, also known as Bayes Mean Estimate). The 
former two algorithms are iterative techniques, while EAP is noniterative and is based on 
numerical quadrature methods. Because it is noniterative (and efficient) it is potentially faster 
than either MLE or MAP in ability estimation. The obvious implication of EAP's efficient 
estimation for CAT is the transparency (as far as the examinee is concerned) of estimating the 
examinee's ability in real time, particularly with more complicated IRT models (e.g., polytomous 
IRT models). Moreover, unlike MLE ability estimates, EAP ability estimates may be obtained for 
all response patterns, including zero and perfect score patterns (Mislevy & Stocking, 1989). 

While MAP ability estimates also exist for all response patterns, they suffer from greater 
regression towards the mean than do the EAP estimates (Bock & Mislevy, 1982; Mislevy & Bock, 
1982). Moreover, in the early stages of an adaptive test the EAP estimate is more stable than the 
MAP estimate and the average squared error for EAP estimates over the population of ability is 
less than that for MAP and MLE ability estimates (Bock & Mislevy, 1982). From an 
implementation perspective, an additional advantage is the simplicity of the mathematics 
required for deriving the computational forms for ability estimation with polytomous IRT models. 

The EAP estimate (Bock & Mislevy, 1982) of an examinee's ability, 0, after n items have been 
administered is given by 



i Xk Ln(Xk) A(Xk) 




iLn(Xk) A(Xk) 
k=l 
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and its posterior standard deviation is 



PSD(§) = 




( 2 ) 



where is one of q quadrature points, A(Xk) is the quadrature weight associated with X^, and 
Ln(^k) the likelihood function of X^ given the response pattern (xj, X2» ..«» Xn)* For example, if 
the probability of a conect response by an individual with ability 0 to a dichotomously score item i 
with location bj is given by the one-parameter logistic (IPL) model 

e(8 - bj) 



P(xi = 110) = 



1 + e(e - bi) 



(3) 



then the likelihood of 0 given the response pattern (xi, X2, ...» Xn) is 

Ln(6) = n P(xi = lie)'‘kl - P(xi = 110))^^ ■ (4) 

i=l 

The XjcS and A(Xk)s may be obtained from tables provided by Stroud and Secrest (1966) for 
approximating the Gaussian enor function. The Stroud and Secrest Gauss-Hermite X^s and A(Xic)s 
must be multiplied by "^2 and “p- (Bock & Lieberman, 1970), respectively, in order to place them 

on the nornial function scale. However, programs, such as BILOG (Mislevy & Bock, 1982), do not use 
the Stroud and Secrest values for EAP ability estimation; neither BILOG nor MULTILOG (Thissen, 
1988) use these values for obtaining item parameter estimates via marginal maximum likelihood 
estimation (MMLE). Rather, a specified range of the 0 continuum (e.g., -4.0 to 4.0) is divided into q 
equidistant discrete points (these points serve as the X^s) and the standard unit normal 
probability density is computed at each of the q points. The probability density at X^ multiplied 
by the difference between successive quadrature points (e.g., X^-Xk+i) is the quadrature weight 
A(Xj^). Because of the symmetric nature of the discrete prior distribution the A(Xk)s only need to 
be calculated for the Xj^s < 0. (Seong (1990a) refers to this method as the "Mislevy histogram" 
technique, although it is probably more accurate to refer to it as the Mislevy "vertical line graph" 
method to emphasize the discrete nature of the prior distribution.) Seong (1990a) has compared the 
item and ability parameter estimates obtained by using, this latter technique with those obtained by 
the Stroud and Secrest values. Seong found that when a large number of quadrature points were 
used (e.g.. 30 or 40) the two methods estimated item and ability parameters equally well, but when 
a small number of quadrature points were specified (e.g., 10), the Mislevy histogram solution 
estimated item and ability parameters more accurately than the Gauss-Hermite quadrature formula. 
It should be noted that Bock and Mislevy (1982) state that the Gauss*Hermite values do not include 
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the likelihood functions found in adaptive testing. Moreover, the and A(Xk)s must satisfy the 
constraints that ZA(Xk) = 1.0, S^kA(Xk) = 0.0, and XX^A(Xk) = 1.0. 

Bock and Mislevy’s (1982) work showed that EAP produces reasonably accurate ability 
estimates. Originally, Bock and Mislevy presented EAP for use in adaptive testing, however in the 
calibration program BILOG (Mislevy & Bock, 1982) it is the default ability estimation approach. 

In adaptive testing the PSD(d) plays the same role as the MLE’s standard error (Bock & Mislevy, 
1982). That is, after 20 items the likelihood function and the posterior distribution are nearly 
identical and the PSD(§) is virtually interchangeable with the standard error (Bock & Mislevy, 
1982); this interchangeability is reflected in the fact that the PSD(§)s are labeled as standard 
errors in the BILOG EAP output. For consistency with and on the basis of Bock and Mislevy 
(1982), the EAP PSD(6) will be referred to as if it were a standard error and will be labeled as 
EAP SEE in the following. 

A number of studies have investigated the effects of various factors on MMLE item parameter 
estimation (e.g., Drasgow, 1989; Harwell & Janosky, 1991; Zwinderman & van der Wollenberg, 
1990). Seong (1990b) evaluated both item parameter estimation and EAP ability estimation. With 
respect to EAP 6s, Seong found that increasing the number of quadrature points from 10 to 20 
produced more accurate 6s, regardless of sample size and appropriateness of the prior 
distribution (i.e., normal, positively and negatively skewed). Because abilities are estimated 
independently of one another it is not surprising that sample size did not have a significant effect 
on the accuracy of EAP 6s Because of the breadth of Seong's study, the EAP estimation findings 
were limited. For instance, test length should affect ability estimation, but was held fixed at 45 
items in Seong’s study. Moreover, Seong studied the accuracy of the 6s in terms of root mean 
square error (not EAP SEE), but in applications where an examinee must be classified on the basis 
of his/her ability estimate (e.g., as a master versus nonmaster) one typically creates a confidence 
interval about the examinee’s ability estimate using the 6’s SEE. As an example, in the American 
Society of Clinical Pathologists’ CAT pathologists are presented an adaptive certification test. If 
the confidence interval for an examinee falls either completely above or completely below the cut 
point, then the examinee may be classified as a master (i.e., certified) or a nonmaster, 
respectively. If the confidence interval spans the cut point, then additional information is 
needed (e.g., more test questions could be asked). The use of confidence intervals incorporates 
our uncertainty about the ability estimate. It should also be noted that in addition to using the 
EAP PSD(§) as if it was a standard error, the PSD(6) calculated by (2) is actually an estimate or an 
approximation and its use for forming confidence intervals may be problematic if the EAP SEE is 
not accurate. Moreover, a typical criterion for terminating an adaptive test is when the 6’s SEE is 
equal to or less than a predetermined value. If the EAP SEE is not accurate, then the adaptive test 



6 



may not be validly terminated. For these reasons this study was primarily concerned with the 
validity of the EAP SEEs. 

Because EAP is based on numerical quadrature methods it requires the specification of a 
number of factors, such as type of prior distribution and the number of quadrature points to 
use in estimation. Each of these factors as well as the test length and the form of the examinees' 
latent distribution may affect the accuracy of the EAP ability estimate and the EAP SEE. This 
study investigated the effects of the number of quadrature points (10, 2*V test length, and 80), 
test length (61 and 122 items), latent ability distribution (bimodal, normal, positively skewed, 
and uniform), and the form of the prior ability distribution (normal and uniform) on the EAP 
SEEs. The 2*V test length and 80 number of quadrature point levels were chosen because 
2W test length is the default value in BILOG for EAP estimation (a normal prior is also default) 
and according to Bock and Mislevy (1982, p. 433) "In applications to real populations, perhaps 
80 quadrature points between ±4.0 standard deviations should be available to insure precision 
down 10 J - 0.2" (although for their simulation they used 21 quadrature points). A bimodal 
latent ability distribution was used to simulate an examinee population that consists of masters 
and nonmasters, and the rationale for the test lengths is presented below. 

METHOD 

Program: A program was written for generating simulees, generating the responses for each 
simulee, performing ability estimation for each simulee, and compiling various summary 
statistics for each simulee as well as across simulees. 

Data. For each of the 4 latent distributions, 100 simulees were sampled from the appropriate 0 
distribution. Then for each simulee at each combination of test length, prior distribution and 
quadrature points, the process of administering a simulated test, as described below, was 
repeated 1000 times. The standard unit normal curve was used as the 0 distribution for the 
normal condition, a beta distribution {v\ = 1.25, V2 = 10) was used to produce the positively 
skewed 0 distribution (skew = 1.14), and the uniform 0 distribution was restricted to the range - 
3.0 <0< 3.0. The bimodal 0 distribution was obtained by generating one-half the sample's 
simulees from a beta distribution with v] = 1.25 and V2 = 10 and one-half from a beta distribution 
with VI and V2 transposed. Each latent ability distribution had a unique seed for generating its 
simulees. 

A sixty-one item pool was generated to have uniform difficulty parameters (b) in the range - 
3.0 < 3.0 in 0.1 logit increments (i.e., b\ = -3.0, b2 - *2.9, etc.). The discrimination (a) and 

the pseudo-guessing (c) parameters were set at 1.0 and 0.0, respectively. The use of these values 
for a and c is discussed below. The 122-item test consisted of the 61 -item test replicated and 
therefore the 122-item test information function was twice that of the 61 -item test. 
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For each simulee, responses were generated using the appropriate item parameters, and the 
simulee's 0 to calculate the probability of obtaining the item correct according to the IPL model. 
This probability was compared to a random number obtained from a uniform distribution [0,1]. If 
the probability was greater than the random number then the simulee's response was 1 (i.e., 
correct), otherwise the simulee’s response was incorrect and coded as 0. 

After the simulee had been administered a test of the appropriate length an EAP ^ and its 
EAP SEE were obtained using the appropriate prior distribution and number of quadrature points. 
This process was repeated 1000 times for each simulee (i.e., there were 1000 ^s for each of the 
100 0 each of the 48 cells in the design). 

Estimation. EAP ability estimates were calculated according to (1) and the EAP SEE was obtained 
according to (2). For the three levels of the number of quadrature factor (10, 2*V test length, and 
80) the XjcS and A(Xk)s were determined using the Mislivy ’’vertical line graph” method described 
above for the range -4.0 < 0 < 4.0. For the 61-item test 2*V test length = 16 and for the 122-item 
test 2 W test length = 23. 

Analyses: In addition to obtaining the EAP SEE of the standard deviation of the 1000 ^s (i.e., 
the empirical SEE) for a given 0 was calculated. The basic design of the study was a four-way 
repeated measures design with the difference between the empirical and EAP SEEs (i.e., 
SEEenipirical • SEEgAP) the dependent variable, latent ability distribution as the between 

subjects factor, and test length, type of prior distribution, and number of quadrature points as 
the within subjects factors. 

In addition to calculating the empirical SEE, 68% and 95% confidence intervals (CIs) based on 
the EAP SEE were calculated and the number of times the 68% and 95% CIs contained 0 were 
counted (CI68% and CI95%, respectively). Analysis of the CIs involved calculating the difference 
between the number of times a given Cl contained 0 and the number of times the Cl was expected to 
contain 0 (i.e., diff68% = CI68% - 680 and diff95% = CI95% - 950). The analyses of diff68% and 
diff95% were treated separately. Diff68% and difi95% were used as the dependent variable in a 
four-way repeated measures analysis with test length, type of prior distribution, and number of 
quadrature points as the within subjects factors and latent ability distribution as the between 
subjects factor. 

The accuracy of ability estimation was assessed by root mean square error (RMSE) and Bias. 
RMSE and Bias were calculated according to: 



RMSE(0) ^ 



Bias(0) 






I (9k - 9)2 



Nk 

I (6k - 9 ) 
Nk 



(5) 






( 6 ) 
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where 0]^ is the ability estimate for simulee k with latent ability 0, and N is the number of 
replications for simulee k (i.e.» N]^ = 1000). RMSE was used as the dependent variable in four- way 
repeated measures analysis with within subjects factors of test length and number of quadrature 
pointSt type of prior distribution, and latent ability distribution as the between subjects factor. 
The analysis of Bias was treated similarly. Descriptive statistics were calculated on the 0s and ds 

as well as on the EAP and empirical SEEs» the difference between SEEs, CI68% and CI95%. 

Fidelity coefficients (reft) were obtained. 

To summarize^ the effect of the four factors (latent ability distribution, prior distribution, 
test length, and number of quadrature points) on the EAP and empirical SEEs was investigated 
using a four-way repeau'd measures design for SEE. The two CIs were each analyzed using a four- 
way repeated measures analysis with diff68% and diff95% as the dependent variables. Accuracy 
of ability estimation was assessed using a four-way repeated measures analysis with RMSE and 
Bias as the dependent variables. Because of its relaxed assumptions a multivariate approach was 
used for all repeated measures analyses. 

RESULTS 

Tables 1 and 2 contain the descriptive statistics on the 0s and ds as well as the reft. As 

can be seen from Table 1, increasing the number of quadrature points from 10 to 2*Vtest length 
and to 80, resulted in the mean and median d becoming more similar to the mean and median 0, 
respectively, regardless of latent distribution, prior distribution, and test length level. 

Table 2 shows that the reft also increased as the number of quadrature points increased from 

10 to 80 nodes regardless of latent distribution, prior distribution, and test length level. 
However, these increases in reft may not be considered meaningful by some because of the 
magnitude of the reft at the 10 quadrature point level. 

Insert Tables 1 and 2 about here 



Descriptive statistics on the empirical and EAP SEEs are presented in Table 3. This table 
shows that increasing the number of quadrature points led to a decrease in the mean empirical 
SEEs regardless of test length, prior distribution, and latent distribution. As would be expected, 
doubling the test length led to a decrease in the average SEEs for all levels of the number of 
quadrature points factor. Furthermore, for a given latent and prior distribution the mean 
empirical SEE for the 10 quadrature point level/122-item test length was, typically, 
approximately the same size as the average SEE for the 16 quadrature point leveI/61-item test 
length and in certain conditions less than those for the 80 quadrature point level at the shorter 
test length. In general, as the number of quadrature points increased the average empirical SEEs 
decreased. In contrast, the EAP SEEs showed the opposite pattern with respect to increasing the 
number of quadrature points. Specifically, increasing the number of quadrature points led to an 
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increase in the mean EAP SEEs. A comparison of the EAP and empirical SEEs shows that, 
regardless of test length, latent and prior distribution, the mean EAP SEEs for the 2* v test length 
and 80 quadrature point levels had a tendency to be in good agreement with the mean empirical 
SEEs. However, the average EAP SEEs tended to underestimate the mean empirical SEEs when 10 
quadrature points was used for estimation, but as the number of quadrature points increased the 
EAP SEEs and empirical SEEs came into closer agreement. As was the case with the empirical 
SEEs, doubling the test length had the expected effect of decreasing the average EAP SEEs. The 
discrepancy between the EAP and empirical SEEs was greatest for the positively skewed latent 
ability distribution. 

Insert Table 3 about here 



The descriptive statistics on CI68% and CI95% are presented in Table 4. Given the SEE 
results it is not surprising that the CI68% and CI95% were affected by the number of quadrature 
points. It is only when 80 quadrature points were used for ability estimation that the CI68% and 
CI95% approximated their expected values of 680 and 950, respectively, regardless of test length, 
prior and latent distributions. 

Insert Table 4 about here 



Table 5 contains the descriptive statistics on RMSE(9) and Bias(0). For the normal, positively 
skewed, and uniform latent distributions increasing the number of quadrature points from 10 to 

A 

80 nodes led to more accurate 9 on average. However, for the bimodal condition there was a slight 
increase in the mean RMSE(9) as the number of nodes increased from 2*V test length to 80. For a 
given number of quadrature points and independent of the latent and prior distributions, 
doubling the test length resulted in a decrease in the average RMSE(9). 

The mean Bias(9) values tended to about 0.0 (range of -0.074 to 0.021) and inspection of the 
corresponding histograms showed that these distributions tended to be somewhat unimodal and 
symmetric about 0.0. There were five instances of bimodal distributions (3 associated with the 
bimodal and 2 with the uniform latent ability distributions) and these occurred with the use of a 
normal prior. Table 5 also shows that there was a slight underestimation Bias(9) for the bimodal 

and positive skew 9 distributions and, in general, a slight overestimation for the normal and 

uniform latent ability distributions. The standard deviations of Bias(9) were correspondingly 
small and decreased with increasing number of quadrature points indicating that the average 

Bias(9) value was a "typical" Bias(9) value and not atypically low because of the compensation that 

takes place in its calculation (see (6)). 

Insert Tabic 5 about here 
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The repeated measures an dyses on the SEE difference are presented in Table 6. As can be 
seen the magnitude of the difference between the EAP and empirical SEEs was affected by type 
of the latent distribution, the test length, and the type of prior distribution as well as the 
number of quadrature points used in estimation. Post hoc analyses on the SEE difference (Table 
7) showed that for the bimodal and uniform latent distribution conditions increasing the 
number of quadrature points from 2 *Vtest length to 80 did not result in a significant 
improvement in the agreement between the EAP and empirical SEEs. This was also true for the 
normal and positively skewed 0 distributions, but only for the 122-item test. However, the use 
of the 61-item test with these two 0 distributions showed that increasing the number of 
quadrature points from 2 test length to 80 did result in a significant improvement in the 

agreement between the EAP and empirical SEEs. It should be noted that the disagreement 
between EAP and empirical SEEs for the normal and positively skewed 0 distributions using a 
61 -item test with 2 test length quadrature points is less than 0.044 (Table 3). 

Insert Tables 6 and 7 about here 



Table 7 also shows that for EAP estimation based on 80 quadrature points and a uniform prior 
distribution doubling the test length did not result in a significant improvement in the agreement 
between EAP and empirical SEEs. This pattern held for the normal prior except for the uniform 
latent distribution condition where the test statistic was marginally significant. The use of 
2 W test length quadrature points, a uniform prior, and 122-item test produced significantly 
greater agreement between the EAP and empirical SEEs for all latent ability distributions. There 
was not as clear a pattern for the other conditions and while it was expected that when the prior 
distribution matched the latent ability distribution there would be better agreement between the 
EAP and empirical SEEs than when there was a mismatch, this pattern did not appear. It should 
be noted that the magnitude of the SEE differences were comparatively small for the 2 test length 
and 80 quadrature point conditions (i.e.; discrepancies in the hundreds and thousandths) and 
only at 10 quadrature points were these discrepancies occurring at the first decimal place. In 
this regard as well as with respect to the power of the tests, some of the significant post hoes may 
not be considered meaningful by some. 

Figure 1 contains the test length by quadrature prints by prior distribution interaction plot 
for each latent ability distribution. The plots clearly show (a) the convergence of empirical and 
EAP SEEs as the number of quadrature points increased; (b) for all 0 distributions the SEEs for 
the 122-item test were less than those for a test half as long for a given quadrature point, prior 
distribution and type of SEE (i.e., empirical or EAP) level; and (c) for a given quadrature point 
level and for a SEE type, the use of a uniform prior resulted in larger values than the use of a 
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normal prior at the 61 -item test length (this difference appeared to disappear at the 122-item 
test length). 

Insert Figure 1 about here 

Table 8 contains the repeated measures analyses for the confidence intervals. For diff68% all 
first-order interactions and the latent ability distribution by test length by number of 
quadrature points interaction were significant, whereas for diff95% the four-way interaction of 
latent ability distribution, test length, number of quadrature points, and type of prior 
distribution was significant. 

Insert Table 8 about here 



Post hoc analysis of the effect of type of prior distribution used in estimation on diff68% 
(Table 9) showed that for a given test length the use of a uniform prior distribution, rather than a 
normal prior, led to significantly better agreement between the average number of 68% CIs 
containing 0 and their expected value of 680. However, for a given prior distribution doubling the 
test length led to a significant increase in the mean number of 68% CIs not containing 0. 

Inspection of Table 4 showed that these significant results were associated with poorer 
performance (i.e., lack of agreement between the number of 68% CIs containing 0 approaching 
their expected value of 680) at the 10 quadrature point level for the 122-item test than at the 61- 
item test length, regardless of the prior and latent ability distribution. When the number of 
quadrature points is increased from 10 to 2 *V test length or greater, then doubling the test 
length produces better agreement between the number of 68% CIs containing 0 approaching their 
expected value of 680 at all levels of prior and latent ability distribution. Moreover, although for 
a given prior distribution increasing the number of quadrature points led to significant 
improvement, only when 10 quadrature points were used for estimation was the choice of prior 
distribution relevant. For instance, the use of 10 quadrature points resulted in significantly 
better agreement between the average number of 68% CIs containing 0 and their expected value of 
680 when a uniform prior distribution was used instead of a normal distribution. However, when 
80 quadrature points were used for EAP estimation the mean diff68% when a normal prior was 
used was -0.808 and for a uniform prior it was 0.518 and the choice of prior was irrelevant. 

While at the 2 test length level there was no significant difference for type of prior 
distribution, there were, on average, 59.71 fewer CI68% not containing 0 than would be expected 
when a normal prior was used and when a uniform prior was used the mean diff68% was -55.80. 
Therefore, only at the 80 quadrature point level was the number of CI68% containing 0 
approaching the expected value of 680. 
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Insert Tables 9 and 10 about here 



Analysis of the CI95% (Table 10) showed that increasing the number of quadrature points 
from 10 to 2 * \ test length resulted in significantly more 95% CIs approaching their expected 
value of 950 regardless of test length, type of prior distribution, and 0 distribution. In addition, 
increasing the number of quadrature points from 2 W test length to 80 led to a significant 
reduction in the mean diff95% for the 61-item test with the use of a normal prior for all latent 
ability distributions. This was also true when a uniform prior was used with a 61 -item test and 
when the 6 distributions were normal or positively skewed. While there was not a significant 
difference between the increase from 2 *V test length to 80 quadrature points for certain 
conditions (e.g., uniform 0 distribution, 122-item test length), a comparison with Table 4 showed 
that the magnitude of the mean difference for these nonsignificant cells was, at most, 1.9 (the 
uniform 0 distribution, uniform prior, 122-item test length cell). That is, for these 
nonsignificant cells and when 80 quadrature points were used for estimation there were, on 
average, 1.9 95% CIs that did not contain 0 and overall there were at most 3.6 95% CIs that did not 
cover the parameter. Therefore, while the difference between 2 *V test length and 80 quadrature 
points may not be significant, in practice the number of CIs which contain 0 when 80 as oppose to 
2 test length quadrature points were used for estimation may be considered meaningful by 
some. For CI68% and using 80 quadrature points for estimation there were at most, on average, 

7.2 68% CIs that did not include 0 (Table 4). 

Table 1 1 contains the repeated measures analyses for RMSE(0) and Bias(0). These 
results showed that the second-order interactions for RMSE(0) were significant, while Bias(0) 
was affected by the interaction of 0 distribution, test length, number of quadrature points, 
and type of prior distribution used. Post hoc analyses for RMSE(0) (Table 12) revealed that 
there was not a significant interaction between type of prior distribution and the number of 
quadrature points within latent ability distribution. In general, increasing the number of 
quadrature points from 10 to 2 test length and from 10 to 80 led to a significant reduction 
in the mean RMSE(0), but increasing from 2 test length to 80 quadrature points did not 

A 

result in significantly more accurate 0s, regardless of 0 distribution (cf.. Table 5), The use of 
a uniform prior instead of a normal prior led to a significant increase in the average RMSE(0), 
however, the magnitude of these increases may not be considered meaningful by some 
individuals (e.g., for the bimodal, normal, positive skew, and uniform 0 distributions the mean 
RMSE(0) were 0,2265 (normal) vs 0.2334 (uniform), 0.2554 (normal) vs 0.2622 (uniform), 

0.2731 (normal) vs 0.2806 (uniform), and 0.2485 (normal) vs 0.2544 (uniform), respectively). 

Insert Table 11 about here 
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The analysis of the quadrature points by test length within 9 distribution interaction 
showed a significant quadrature points by test length interaction. For all levels of the 
quadrature points factor increasing the test length from 61 to 122 items produced 
significantly more accurate 6s, regardless of latent ability distribution. For all latent ability 
distributions, except for the positive skew 6 distribution, increasing the number of 
quadrature points from 10 to 2 test length and from 10 to 80 led to a significant reduction 
in the mean RMSE(6), but increasing from 2 *Vtest length to 80 quadrature points did not 

A 

result in significantly more accurate 0s, regardless of 0 distribution and test length. 

Insert Table 12 about here 



Within latent ability distribution there was not a significant test length by type of prior 

distribution interaction. As was the case with the quadrature points by test length within 0 

distribution interaction, doubling the test length led to a significant reduction in the average 

RMSE(0) for all 0 distributions. Furthermore and similar to the prior distribution by the number 
of quadrature points within latent ability distribution interaction, the use of a uniform prior 
instead of a normal prior led to a significant increase in the average RMSE(0), regardless of latent 

ability distribution. 

There was a significant number of quadrature points by test length interaction for both 
normal and uniform prior distributions. Increasing the test length from 61 to 122 items 
produced a significant reduction in RMSE(0) for all levels of the number of quadrature points 
factor, regardless of type of prior distribution used in ability estimation. For both types of prior 
distributions, increasing the number of quadrature points from 10 to 2 ♦'V test length and from 10 
to 80 led to a significant reduction in the mean RMSE(0), but increasing from 2 W test length to 

A 

80 quadrature points did not result in significantly more accurate 0s. 

Post hoc analyses of Bias(0) are presented in Table 13. As can be seen all significant 
differences amongst the levels of the number of quadrature, points factor occurred when a 61 -item 
test was used and were reflective of a reduction in the average Bias(0) at the larger number of 
quadrature points level from the mean Bias(0) at the lower number of quadrature points level (cf.. 
Table 5). Similarly, the significant differences between the 61- and 122-item tests were 
produced by Bias(0) for the 122-item test being less than that for the 61-item test. 

Insert Table 13 about here 
CONCLUSIONS AND DISCUSSION 

While it may be argued by some that varying a and c is more realistic with respect to actual 
testing situations, this study used a IPL model because it was considered to avoid a number of 
confounding issues and needlessly complicate the study. A thought-experiment may be sufficient 
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to consider what may occur using models with varying item discrimination {as) and/or the 
pseudo-guessing parameter (cs). If a is allowed to increase from the study's value of I.0» then 
given the inverse relationship between information (1(0)) and SEE the EAP SEEs would become 
smaller than those obtained in this study. However, because of the greater information available 

A 

for ability estimation, the 0s would become more stable and the empirical SEE would also 
decrease. In short, the discrepancy between the EAP and empirical SEEs at 10 quadrature points 
would still exist. ..The use of items with low as is not considered meaningful because in practice 
items with low as are not considered desirable (i.e., most psychometrician prefer to use items 
which discriminate well and to increase 1(0) rather than to decrease it). Using slightlv less 
informative items than used in the study, say 0.8 < a < 1.0, would increase the EAP SEE. However, 

A 

these same items would make the 0s comparatively less accurate and thereby increase the 
empirical SEEs. The discrepancy between the EAP and empirical SEEs would still remain. 

Allowing c to increase from the study’s value of 0.0 would have a similar impact. When c > 0.0 
the location of maximum 1(0) simply shifts to be higher than the item's difficulty value {b) and 
lowers the amount of information available for estimation. Therefore, with increasing c the 
variance and standard error of estimation increase. There are two possible scenarios with 
scenario I requiring an assumption. Scenario 1 requires one to assume that by increasing c and 
thereby decreasing the information available for ability estimation it is possible to still obtain 

A 

reasonably stable and accurate 0s (and not increase the empirical SEE). If this is true, then 
conceivably there is a value of c > 0.0 that will sufficiently increase the EAP SEE so that it agrees 
with the empirical SEE. That is, the goal is to construct a test using items that examinees have a 
large probability of correctly answering without knowing the correct answers (i.e., guessing) and 

A 

still obtain accurate 0s for those examinees. Scenario 2 is that increasing c and thereby 

A 

decreasing the information available for ability estimation results in unstable and inaccurate 0s. 
This instability and inaccuracy is reflected in a larger empirical SEE than would be obtained if c 
= 0.0. Therefore, there is no c which will sufficiently increase the EAP SEE so that it agrees with 
the empirical SEE because as c increases so does the empirical SEE. It is this latter issue which 
also addresses the use of "reasonable” cs of say, less than 0.25. 

To summarize the results of our thought experiment, any nonzero c or a value of a < 1.0 will 
increase the empirical and EAP SEEs. Increasing a will decrease the EAP and empirical SEEs. In 
all cases the discrepancy between the EAP and empirical SEEs that was observed at 10 quadrature 
points will continue to exist. 

As mentioned above, Bock and Mislevy (1982) state that the PSD(6) is virtually 
interchangeable with the standard error after about 20 items. Part of the support for this 
statement comes from their adaptive test simulation results which were based on the use of 21 
quadrature points for estimation. This study showed that considering the PSD(6) to be 
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interchangeable with the standard error is questionable even with 122 items if the number of 
quadrature points is 10; given the trend in the data (see Figure 1) this is probably also true for 
less than 10 quadrature points. As the number of quadrature points increase it appears that 
considering PSD(§) to be interchangeable with the standard error is reasonable. For example, 
given that RMSE = V SEE^ + Bias(0)^, the agreement between the observed mean RMSE(0) with the 
mean RMSEs based on the EAP and empirical SEEs was assessed (Table 14). As can be seen, when 
the number of quadrature points is 80 there is very good agreement between the observed mean 
RMSE(0) and the RMSEs calculated on the basis of either the EAP SEE or the empirical SEE. 

Insert Table 14 about here 



O 

ERIC 



This studied showed that when the purpose of assessment is to rank-order examinees in terms, 
of ability, the use of 10 quadrature points provides very good agreement (i.e., re^) between the 

A 

EAP 0s and their corresponding 0s for symmetric distributions. If there is reason to suspect that 
the latent ability distribution is skewed, then the use of 2 W test length quadrature points may 

A 

be called for. More accurate 0s (i.e., in terms of RMSE(0) and Bias(0)) may be obtained by 
increasing the test length as well as the number of quadrature points. Furthermore, Table 5 
showed that for a fixed test length the accuracy (mean RMSE(0)) may be increased simply by 
increasing the number of quadrature points from 10 to 80. For example, the use of 80 quadrature 
points with a 61 -item test produced RMSE(0)s that were less than those of a test twice as long, but 
using 10 quadrature points for estimation. 

Given the SEE difference, the diff68% and diff95%, and the RMSE = V SEE^ + Bias(0)2 
relationship analyses, it appears that the use of 10 quadrature points tends to result in EAP 

A 

SEE(0)s which underestimate the observed standard error. These SEEs give the false 

A 

impression that the 0 is being estimated more accurately than, in fact, it is. Creation of 
confidence intervals will be erroneously narrower than what they should be and classification 
decisions based on such CIs will potentially be incorrect. For instance, examinees may be 
classified as masters (e.g., certified) because their (erroneously narrow) CIs fall above the 
standard. In these applications it is necessary to increase the number of quadrature points 
used in EAP estimation. A conservative approach would be to use 80 quadrature points 
because, overall, this level provided the best agreement between the CIs and their expected 
values. Clearly, there are situations where the use of 2 test length quadrature points may 
be reasonable given the test’s length, the type of prior distribution used, and knowledge of 0’s 

distribution. 

When a CAT using EAP ability estimation is terminated using the standard error criterion, it 
appears necessary to use about 80 quadrature points if the adaptive test will be validly 
terminated, regardless of latent 0 distribution. This is also true if the EAP SEE will be used to 
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estimate the reliability coefficient; Bock & Mislevy (1982) state that 1 - PSD(§)^ is the 
reliability coefficient for the EAP If it is reasonable to assume a bimodal or uniform 9 
distribution, then the use 2 * v test length quadrature points with a normal prior distribution 
appears to be sufficient for accurate EAP SEEs. However, because of the interaction between test 
length, number of quadrature points, and EAP SEE, shorter length tests may require greater than 
2 test length number of quadrature points to obtain accurate EAP SEEs. Given the observed 
r 9 ^s with 10 quadrature points it may be permissible to use 10 quadrature points in nonadaptive 
testing situations if the EAP SEE will not be used. 
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Table 1: Descriptive statistics on ds and 9s. 



Test Length 

61 122 



Latent 0 Quadrature Points Quadrature Points 



Distribution 


Prior 




1 0 


1 6 


80 


1 0 


23 


80 


8 


Bimodal 


Normal 


Mean 


- 0.026 


- 0.01 1 


- 0.001 


- 0.019 


- o.oor 


- 0.001 


0.000 






Median 


- 0.619 


- 0.405 


- 0.296 


- 0.729 


- 0.329 


- 0.3 14 


- 0.308 






SD 


0.998 


0.964 


0.951 


1.018 


0.981 


0.978 


1.056 






Skew 


0.265 


0.234 


0.220 


0.296 


0.221 


0.220 


0.220 




Uniform 


Mean 


- 0.026 


- 0.010 


- 0.001 


- 0.021 


- 0.001 


- 0.001 








Median 


- 0.620 


- 0.414 


- 0.313 


- 0.729 


- 0.338 


- 0.323 








SD 


1.052 


1.021 


1.010 


1.048 


1.011 


1.008 








Skew 


0.259 


0.234 


0.221 


0.285 


0.221 


0.220 




Normal 


Normal 


Mean 


- 0.068 


- 0.072 


- 0.074 


- 0.067 


- 0.076 


- 0.076 


- 0.079 






Median 


- 0.041 


- 0.026 


- 0.017 


- 0.042 


- 0.092 


- 0.014 


- 0.010 






SD 


1.002 


0.943 


0.924 


1.037 


0.954 


0.949 


0.977 






Skew 


0.025 


0.010 


0.006 


0.002 


0.014 


0.002 


0.005 




Uniform 


Mean 


- 0.074 


- 0.077 


- 0.079 


- 0.070 


- 0.079 


- 0.079 








Median 


- 0.041 


- 0.026 


- 0.018 


- 0.042 


- 0.089 


- 0.015 








SD 


1.054 


1.004 


0.987 


1.064 


0.986 


0.980 








Skew 


0.021 


0.014 


0.012 


- 0.001 


0.015 


0.005 




Positive Skew 


Normal 


Mean 


- 0.055 


- 0.010 


0.010 


- 0.060 


0.002 


0.011 


0.014 






Median 


- 0.361 


- 0.229 


- 0.156 


- 0.488 


- 0.212 


- 0.158 


- 0.156 






SD 


0.826 


0.745 


0.716 


0.860 


0.740 


0.734 


0.754 






Skew 


0.854 


1.056 


1.141 


1.157 


1.156 


1.143 


1.157 




Uniform 


Mean 


- 0.046 


- 0.006 


0.012 


- 0.053 


0.004 


0.012 








Median 


- 0.361 


- 0.233 


- 0.164 


- 0.488 


- 0.215 


- 0.163 








SD 


0.860 


0.790 


0.763 


0.876 


0.764 


0.758 








Skew ■ 


0.916 


1.094 


1.169 


0.874 


1.168 


1.157 




Uniform 


Normal 


Mean 


- 0.178 


- 0.182 


- 0.185 


- 0.185 


- 0.192 


- 0.193 


- 0.199 






Median 


- 0.733 


- 0.520 


- 0.429 


- 0.792 


- 0.432 


- 0.430 


- 0.438 






SD 


1.606 


1.595 


1.591 


1.652 


1.642 


1.641 


1.695 






Skew 


0.174 


0.181 


0.184 


0.173 


0.181 


0.182 


0.180 




Uniform 


Mean 


- 0.193 


- 0.197 


- 0.199 


- 0.193 


- 0.200 


- 0.200 








Median 


- 0.734 


- 0.537 


- 0.454 


- 0.792 


- 0.444 


- 0.443 








SD 


1.733 


1.723 


1.717 


1.721 


1.708 


1.706 








Skew 


0.169 


0.175 


0.178 


0.170 


0.178 


0.178 
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Table 2: Fidelity coefficients. 



Latent 0 
Distribution 


Prior 


1 0 


Test 

61 

Quadrature Points 
2*Vlengtha 80 


Length 

1 2 2 

Quadrature Points 
10 2*Vlengtha 8 0 


Bimodal 


Normal 


0.9910 


0.9990 


1.0000 


0.9865 


0.9998 


l.OCOO 




Uniform 


0.9935 


0.9993 


1.0000 


0.9886 


0.9998 


1.0000 


Normal 


Normal 


0.9835 


0.9978 


0.9999 


0.9765 


0.9994 


1.0000 




Uniform 


0.9877 


0.9984 


0.9999 


0.9793 ■ 


0.9995 


1.0000 


Positive Skew 


Normal 


0.9749 


0.9961 


0.9999 


0.9587 


0.9987 


0.9999 




Uniform 


0.9798 


0.9971 


0.9999 


0.9620 


0.9989 


0.9999 


Uniform 


Normal 


0.9975 


0.9997 


1.0000 


0.9960 


0.9999 


1.0000 




Uniform 


0.9983 


0.9998 


1.0000 


0.9966 


0.9999 


1.0000 



^Iength= test length 
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Table 3: Mean EAP, Empirical, and Difference SEEs^ 



Latent 0 
Distribution 


Prior 


SEE.a 


Test 

61 

Quadrature Points 
10 2*Vlengthb 8 0 


Length 

12 2 
Quadrature 
10 2*Vlength 


Points 

80 


Bimodal 


Normal 


Emp 


0.248 


0.227 


0.232 


0.202 


0.166 


0.169 






EAP 


0.160 


0.225 


0.239 


0.093 


0.167 


0.171 






Diff 


0.087 


0.003 


-0.007 


0.108 


-0.001 


-0.003 




Uniform 


Emp 


0.263 


0.243 


0.247 


0.209 


0.172 


0.174 






EAP 


0.175 


0.234 


0.246 


0.098 


0.170 


0.174 






Diff 


O.G38 


0.008 


0.001 


0.111 


0.002 


0.000 


Normal 


Normal 


Emp 


0.302 


0.249 


0.234 


0.236 


0.178 


0.170 






EAP 


0.155 


0.224 


0.240 


0.081 


0.166 


0.172 






Diff 


0.147 


0.025 


-0.007 


0.155 


0.01.2 


-0.002 




Uniform 


Emp 


0.314 


0.263 


0.250 


0.240 


0.184 


0.176 






EAP 


0.167 


0.234 


0.248 


0.084 


0.169 


0.175 






Diff 


0.147 


0.029 


0.002 


0.157 


0.015 


0.001 


Positive Skew 


Normal 


Emp 


0.343 


0.263 


0.234 


0.268 


0.176 


0.170 






EAP 


0.152 


0.222 


0.239 


0.076 


0.162 


0.171 






Diff 


0.191 


0.041 


-0.006 


0.192 


0.013 


-0.002 




Uniform 


Emp 


0.353 


0.275 


0.249 


0.273 


0.181 


0.175 






EAP 


0.162 


0.231 


0.247 


0.079 


0.166 


0.174 






Diff 


0.191 


0.044 


0.002 


0.194 


0.015 


0.001 


Uniform 


Normal 


Emp 


0.275 


0.245 


0.240 


0.234 


0.180 


0.177 






EAP 


0.203 


0.244 


0.250 


0.120 


0.178 


0.181 






Diff 


0.072 


0.001 


-0.010 


0.114 


0.002 


-0.004 




Uniform 


Emp 


0.303 


0.272 


0.266 


0.248 


0.190 


0.187 






EAP 


0.224 


0.259 


0.263 


0.130 


0.183 


0.186 






Diff 


0.079 


0.013 


0.002 


0.118 


0.007 


0.002 



“Emp = SEEempirical- EAP = SEEeaP. D>ff = (SEEempirical - SEEeaP) ; *’length= test length 
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Table 4: Descriptive statistics on 68% and 95% confidence intervals^ 













Test 


Length 














6 1 






1 22 




Latent 0 






Quadrature 


Points 


Quadrature Points 


Distribution 


Prior 




1 0 2Wlength^ 80 


1 0 2 


♦Vlength^ 


80 


68% confidence interval 
















Bimodal 


Normal 


Mean 


335.5 


663.8 


679.4 


177.1 


675.9 


686.9 






SD 


214.2 


156.0 


44.0 


132.2 


85.8 


34.9 




Uniform 


Mean 


372.6 


658.0 


684.5 


186.0 


675.7 


686.1 






SD 


230.5 


143.5 


44.2 


135.6 


75.8 


35.4 


Normal 


Normal 


Mean 


289.5 


582.6 


685.1 


120.1 


609.5 


684.8 






SD 


224.6 


209.0 


49.0 


118.1 


166.0 


41.1 




Uniform 


Mean 


305.2 


586.1 


684.0 


136.4 


615.7 


677.4 






SD 


237.0 


205.8 


44.3- 


130.5 


166.3 


42.8 


Positive Skew 


No»*mal 


Mean 


238.0 


540.1 


677.6 


103.9 


608.1 


685.9 






SD 


203.7 


228.6 


51.7 


134.6 


185.0 


43.4 




Uniform 


Mean 


269.9 


543.5 


672.8 


113.8 


610.3 


680.7 






SD 


236.0 


221.1 


52.6 


147.9 


186.8 


39.9 


Uniform 


Normal 


Mean 


434.1 


630.0 


664.3 


242.8 


652.3 


669.8 






SD 


236.8 


138.0 


57.1 


148.1 


11 1.0 


40.0 




Uniform 


Mean 


463.1 


649.6 


675.8 


263.7 


654.8 


682.8 






SD 


234.4 


135.3 


51.0 


153.2 


1 14.4 


38.7 


95% confidence interval 
















Bimodal 


Normal 


Mean 


496.9 


904.6 


948.8 


?80.6 


934.7 


950.2 






SD 


248.5 


113.1 


16.5 


K'7.3 


71.6 


11.5 




Uniform 


Mean 


549.6 


918.0 


950.8 


291.0 


935.0 


950.1 






SD 


254.3 


105.3 


13.0 


184.5 


71.2 


11.0 


Normal 


Normal 


Mean 


413.2 


844.7 


949.6 


208.8 


868.7 


948.7 






SD 


271.5 


173.5 


16.0 


181.6 


159.3 


15.7 




Uniform 


Mean 


458.0 


858.0 


948.5 


225.1 


872.6 


948.9 






SD 


292.1 


166.0 


15.7 


189.6 


153.8 


14.7 


Positive Skew 


Normal 


Mean 


345.4 


788.7 


952.3 


176.9 


881.8 


947.8 






SD 


243.3 


207.6 


16.0 


178.0 


145.4 


16.3 




Uniform 


Mean 


380.0 


808.7 


948.3 


188.3 


884.5 


946.4 






SD 


271.1 


201.2 


15.3 


183.0 


144.8 


15.5 


Uniform 


Normal 


Mean 


634.3 


893.7 


935.0 


383.3 


919.8 


942.5 






SD 


261.6 


122.6 


27.5 


210.0 


87.2 


15.3 




Uniform 


Mean 


676.4 


913.2 


949.3 


41 1.0 


925.6 


948.1 






SD 


268.3 


124.3 


14.4 


209.6 


88.4 


9 ' 



^SD=standard deviation; ^length= test length 
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Table 5: Descriptive statistics on RMSE(6) and Bias(9)^. 





Test 


Length 




6 1 


1 22 


Latent 9 


Quadrature Points 


Quadrature Points 


Distribution Prior 


10 2*Vlengthb 80 


10 2*Vlengthb 80 



RMSE(0) 



Bimodal 


Normal 


Mean 


0.282 


0.235 


0.238 


0.265 


0.169 


0.171 






SD 


0.122 


0.044 


0.009 


0.137 


0.020 


0.006 




Uniform 


Mean 


0.293 


0.246 


0.247 


0.267 


0.173 


0.174 






SD 


0.1 14 


0.040 


0.008 


0.135 


0.019 


0.005 


Normal 


Normal 


Mean 


0.352 


0.258 


0.239 


0.328 


0.183 


0.172 






SD 


0.186 


0.070 


0.016 


0.207 


0.042 


0.011 




Uniform 


Mean 


0.361 


0.270 


0.250 


0.330 


0.187 


0.176 






SD 


0.177 


0.065 


0.013 


0.204 


0.041 


0.010 


Positive Skew 


Normal 


Mean 


0.400 


0.272 


0.237 


0.379 


0.180 


0.171 






SD 


0.206 


0.079 


0.013 


0.228 


0.042 


0.010 




Uniform 


Mean 


0.410 


0.284 


0.249 


0.382 


0.185 


0.175 






SD 


0.195 


0.074 


0.011 


0.225 


0.040 


0.009 


Uniform 


Normal 


Mean 


0.31 1 


0.267 


0.261 


0.280 


0.188 


0.185 






SD 


0.121 


0.051 


0.032 


0.142 


0.029 


0.020 




Uniform 


Mean 


0.321 


0.275 


0.267 


0.284 


0.192 


0.188 






SD 


0.1 15 


0.049 


0.028 


0.137 


0.030 


0.021 


Bias(0) 


















Bimodal 


Normal 


Mean 


-0.026 


-0.01 1 


-0.001 


-0.019 


-0.001 


-0.001 






SD 


0.135 


0.060 


0.055 


0.167 


0.031 


0.028 




Uniform 


Mean 


-0.026 


-0.010 


-0.001 


-0.021 


-0.001 


-0.001 






SD 


0.127 


0.042 


0.009 


0.161 


0.019 


0.007 


Normal 


Normal 


Mean 


0.011 


0.007 


0.005 


0.012 


0.003 


0.003 






SD 


0.181 


0.072 


0.054 


0.226 


0.041 


0.029 




Uniform 


Mean 


0.005 


0.002 


0.000 


0.009 


0.000 


0.000 






SD 


0.177 


0.062 


0.014 


0.225 


0.033 


0.009 


Positive Skew 


Normal 


Mean 


-0.070 


-0.024 


-0.005 


-0.074 


-0.012 


-0.003 






SD 


0.191 


0.067 


0.040 


0.254 


0.040 


0.023 




Uniform 


Mean 


-0.060 


-0.020 


-0.002 


-0.067 


-0.01 1 


-0.002 






SD 


0.193 


0.069 


0.014 


0.255 


0.037 


0.009 


Uniform 


Normal 


Mean 


0.021 


0.016 


0.014 


0.014 


0.007 


0.006 






SD 


0.147 


0.108 


0.105 


0.156 


0.057 


0.055 




Uniform 


Mean 


0.006 


0.002 


0.000 


0.006 


-0.001 


-0.001 






SD 


0.108 


0.046 


0.027 


0.142 


0.026 


0.015 



^SD=standard deviation; ^length= test length 
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Table 6: Repealed Measures Analysis of SEE difference (SEEempirical ■ SEEasymptotic) 



Source 


VI 


V2 


F 


Latent^ 


3 


396 


13.26** 


Length^ 


1 


396 


5.51* 


QuadPts^ 


2 


395 


716.30** 


Prior^ 


1 


396 


363.67** 


Latent X Length 


3 


396 


18.51** 


Latent X QuadPts 


6 


790 


8.38** 


Latent X Prior 


3 


396 


25.40** 


Length X QuadPts 


2 


395 


48.39** 


Length X Prior 


1 


396 


188.15** 


QuadPts X Prior 


2 


395 


164.55** 


Latent X Length X QuadPts 


6 


790 


11.47** 


Latent X Length X Prior 


3 


396 


30.05** 


Latent X QuadPts X Prior 


6 


790 


14.67** 


Length X QuadPts X Prior 


2 


395 


236.59** 


Latent X Length X QuadPts X Prior 


6 


790 


16.25** 



^Latent Distribution; ^Test Length; ^Number of Quadrature Points; 
^Prior Distribution; * p < 0.05» ** p < 0.01 
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Table 7: Post Hoc Analyses (t-tests) for SEE difference (SEEempirical ’ SEEeaP)* 



Latent 0 

Distribution Hypotheses 


Prior Distribution 
Normal Uniform 

Test Length Test Length 

61 122 61 122 




Bimodal 


mo vs length 


12.58** 


16.16** 


11.82** 


16.25** 






lAlO VS P80 


13.94** 


16.49** 


12.93** 


16.53** 






mWlength vs itgO 


1.36 


0.32 


1.11 


0.28 




Normal 


jtlO vs m*Vlength 


11.35** 


13.20** 


10.96** 


13.23** 






1^10 vs P80 


14.27** 


14.58** 


13.56** 


14.54** 






mW length vs iigO 


2.92** 


1.37 


2.59** 


1.30 




Positive 


mo vs length 


12.28** 


14.70** 


12.17** 


14.85** 




Skew 


ItlO vs jtgO 


16.15** 


15.92** 


15.68** 


16.01** 






mW length vs iigO 


3.86** 


1.22 


3.51** 


1.16 




Uniform 


mo vs it2*V length 


9.53** 


14.98** 


8.99** 


15.08** 






ItlO vs jtgO 


11.05** 


15.72** 


10.45** 


15.80** 






mWlength vs iigQ 


1.52 


0.74 


1.46 


0.72 










Latent Distribution 




Prior 


Quadrature 






Positive 




Distribution Points^ Hypotheses 


Bimodal 


Normal 


Skew 


Uniform 


Normal 


1 0 Jt61 vs Iti22 


-6.01** -1.83 


-0. 


28 


-12.39** 




2*Vlength 


0.91 


3.03** 


6.40** 


-0.26 




80 


-1.10 


-1.02 


-0. 


,91 


-1.98* 


Uniform 


1 0 P61 vs P122 


-6.89** -2.43** 


' -0, 


.78 


-12.37** 




2*Vlength 


1.98* 


‘ 3.75** 


6.84** 


2.07* 




80 


0.31 


0.24 


0.16 


0.31 



^length™ test length; * p < 0.05, ** p < 0.01 
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Table 8: Repeated Measures Analysis 


of 


diff68% 


and diff95%. 




Source 


VI 


V2 


Fdiff68% 


Fdiff95% 


Latent^ 


3 


396 


21.41** 


28.70** 


Length^ 


1 


396 


149.38** 


288.73** 


QuadPts^ 


2 


395 


1134.64** 


1339.62** 


Prior^ 


1 


396 


34.76** 


209.88** 


Latent X Length 


3 


396 


4.28** 


7.57** 


Latent X QuadPts 


6 


790 


12.38** 


15.84** 


Latent X Prior 


3 


396 


2.71* 


3.65* 


Length X QuadPts 


2 


395 


263.00** 


443.28** 


Length X Prior 


1 


396 


6.17* 


71.18** 


QuadPts X Prior 


2 


395 


26.14** 


82.90** 


Latent X Length X QuadPts 


6 


790 


2.17* 


5.10** 


Latent X Length X Prior 


3 


396 


0.51 


1.04 


Latent X QuadPts X Prior 


6 


790 


1.72 


2.14* 


Length X QuadPts X Prior 


2 


395 


1.95 


27.91** 


Latent X Length X QuadPts X Prior 


6 


790 


2.08 


2.15* 



^Latent Distribution; '^Test Length; ''Number of Quadrature Points; '^Prior Distribution; 
• p < 0.05, •• p < 0.01 
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